29 research outputs found
On the Geographic Location of Internet Resources
One relatively unexplored question about the Internet's physical structure concerns the geographical location of its components: routers, links and autonomous systems (ASes). We study this question using two large inventories of Internet routers and links, collected by different methods and about two years apart. We first map each router to its geographical location using two different state-of-the-art tools. We then study the relationship between router location and population density; between geographic distance and link density; and between the size and geographic extent of ASes.
Our findings are consistent across the two datasets and both mapping methods. First, as expected, router density per person varies widely over different economic regions; however, in economically homogeneous regions, router density shows a strong superlinear relationship to population density. Second, the probability that two routers are directly connected is strongly dependent on distance; our data is consistent with a model in which a majority (up to 75-95%) of link formation is based on geographical distance (as in the Waxman topology generation method). Finally, we find that ASes show high variability in geographic size, which is correlated with other measures of AS size (degree and number of interfaces). Among small to medium ASes, ASes show wide variability in their geographic dispersal; however, all ASes exceeding a certain threshold in size are maximally dispersed geographically. These findings have many implications for the next generation of topology generators, which we envisage as producing router-level graphs annotated with attributes such as link latencies, AS identifiers and geographical locations.National Science Foundation (CCR-9706685, ANI-9986397, ANI-0095988, CAREER ANI-0093296); DARPA; CAID
Analysis of OD Flows (Raw Data)
In a recent paper, Structural Analysis of Network Traffic Flows, we analyzed the set of Origin Destination traffic flows from the Sprint-Europe and Abilene backbone networks. This report presents the complete set of results from analyzing data from both networks. The results in this report are specific to the Sprint-1 and Abilene datasets studied in the above paper. The following results are presented here:
1 Rows of Principal Matrix (V) 2
1.1 Sprint-1 Dataset ................................ 2
1.2 Abilene Dataset.................................. 9
2 Set of Eigenflows 14
2.1 Sprint-1 Dataset.................................. 14
2.2 Abilene Dataset................................... 21
3 Classifying Eigenflows 26
3.1 Sprint-1 Dataset.................................. 26
3.2 Abilene Datase.................................... 44Centre National de la Recherche Scientifique (CNRS) France; Sprint Labs; Office of Naval Research (N000140310043); National Science Foundation (ANI-9986397, CCR-0325701
Sampling Biases in IP Topology Measurements
Considerable attention has been focused on the properties of graphs derived from Internet measurements. Router-level topologies collected via traceroute-like methods have led some to conclude that the router graph of the Internet is well modeled as a power-law random graph. In such a graph, the degree distribution of nodes follows a distribution with a power-law tail
Mining anomalies using traffic feature distributions
The increasing practicality of large-scale flow capture makes it possible to conceive of traffic analysis methods that detect and identify a large and diverse set of anomalies. However the challenge of effectively analyzing this massive data source for anomaly diagnosis is as yet unmet. We argue that the distributions of packet features (IP addresses and ports) observed in flow traces reveals both the presence and the structure of a wide range of anomalies. Using entropy as a summarization tool, we show that the analysis of feature distributions leads to significant advances on two fronts: (1) it enables highly sensitive detection of a wide range of anomalies, augmenting detections by volume-based methods, and (2) it enables automatic classification of anomalies via unsupervised learning. We show that using feature distributions, anomalies naturally fall into distinct and meaningful clusters. These clusters can be used to automatically classify anomalies and to uncover new anomaly types. We validate our claims on data from two backbone networks (Abilene and Geant) and conclude that feature distributions show promise as a key element of a fairly general network anomaly diagnosis framework
Multivariate online anomaly detection using kernel recursive least squares
Abstract — High-speed backbones are regularly affected by various kinds of network anomalies, ranging from malicious attacks to harmless large data transfers. Different types of anomalies affect the network in different ways, and it is difficult to know a priori how a potential anomaly will exhibit itself in traffic statistics. In this paper we describe an online, sequential, anomaly detection algorithm, that is suitable for use with multivariate data. The proposed algorithm is based on the kernel version of the recursive least squares algorithm. It assumes no model for network traffic or anomalies, and constructs and adapts a dictionary of features that approximately spans the subspace of normal behaviour. The algorithm raises an alarm immediately upon encountering a deviation from the norm. Through comparison with existing block-based offline methods based upon Principal Component Analysis, we demonstrate that our online algorithm is equally effective but has much faster time-to-detection and lower computational complexity. We also explore minimum volume set approaches in identifying the region of normality. I
Diagnosing Network-Wide Traffic Anomalies
Anomalies are unusual and significant changes in a network's traffic levels, which can often involve multiple links. Diagnosing anomalies is critical for both network operators and end users. It is a difficult problem because one must extract and interpret anomalous patterns from large amounts of high-dimensional, noisy data